Goto

Collaborating Authors

 tabula rasa


Reincarnating Reinforcement Learning: Reusing Prior Computation to Accelerate Progress

Neural Information Processing Systems

Learning tabula rasa, that is without any prior knowledge, is the prevalent workflow in reinforcement learning (RL) research. However, RL systems, when applied to large-scale settings, rarely operate tabula rasa. Such large-scale systems undergo multiple design or algorithmic changes during their development cycle and use ad hoc approaches for incorporating these changes without re-training from scratch, which would have been prohibitively expensive. Additionally, the inefficiency of deep RL typically excludes researchers without access to industrial-scale resources from tackling computationally-demanding problems. To address these issues, we present reincarnating RL as an alternative workflow or class of problem settings, where prior computational work (e.g., learned policies) is reused or transferred between design iterations of an RL agent, or from one RL agent to another. As a step towards enabling reincarnating RL from any agent to any other agent, we focus on the specific setting of efficiently transferring an existing sub-optimal policy to a standalone value-based RL agent. We find that existing approaches fail in this setting and propose a simple algorithm to address their limitations. Equipped with this algorithm, we demonstrate reincarnating RL's gains over tabula rasa RL on Atari 2600 games, a challenging locomotion task, and the real-world problem of navigating stratospheric balloons. Overall, this work argues for an alternative approach to RL research, which we believe could significantly improve real-world RL adoption and help democratize it further.


Reincarnating Reinforcement Learning: Reusing Prior Computation to Accelerate Progress

Neural Information Processing Systems

Learning tabula rasa, that is without any prior knowledge, is the prevalent workflow in reinforcement learning (RL) research. However, RL systems, when applied to large-scale settings, rarely operate tabula rasa. Such large-scale systems undergo multiple design or algorithmic changes during their development cycle and use ad hoc approaches for incorporating these changes without re-training from scratch, which would have been prohibitively expensive. Additionally, the inefficiency of deep RL typically excludes researchers without access to industrial-scale resources from tackling computationally-demanding problems. To address these issues, we present reincarnating RL as an alternative workflow or class of problem settings, where prior computational work (e.g., learned policies) is reused or transferred between design iterations of an RL agent, or from one RL agent to another.


Accelerating Meta-Learning by Sharing Gradients

Chang, Oscar, Lipson, Hod

arXiv.org Artificial Intelligence

The success of gradient-based meta-learning is primarily attributed to its ability to leverage related tasks to learn task-invariant information. However, the absence of interactions between different tasks in the inner loop leads to task-specific over-fitting in the initial phase of meta-training. While this is eventually corrected by the presence of these interactions in the outer loop, it comes at a significant cost of slower meta-learning. To address this limitation, we explicitly encode task relatedness via an inner loop regularization mechanism inspired by multi-task learning. Our algorithm shares gradient information from previously encountered tasks as well as concurrent tasks in the same task batch, and scales their contribution with meta-learned parameters. We show using two popular few-shot classification datasets that gradient sharing enables meta-learning under bigger inner loop learning rates and can accelerate the meta-training process by up to 134%.


'World of Warcraft' Has a Lot to Teach the Twitter Clones

WIRED

Another week, another catastrophic failure of policy at Twitter that's being eagerly exploited by its myriad competitors--and they truly are myriad. And yet, in spite of the momentary success of some of these platforms--Threads has gotten over 70 million signups as of this writing--none has quite ascended to the lofty heights of Twitter's influence at its height, where it seemed, for good or for ill (let's be honest, mostly ill), to be at the heart of every conversation among our world's epistemic elites. To understand why, we have to go to Azeroth. Black Mirror creator Charlie Brooker once, tongue-in-cheek, called Twitter the best video game of all time, likening it to the then-still-popular wave of Massively Multiplayer Online Roleplaying Games, or MMORPGs, that were led by titles like World of Warcraft. Aside from the obvious connections--adopting an online persona in a gamified system entirely governed by earned metrics--we can also look at the fact that Twitter, like World of Warcraft, is surrounded by failed imitators.


Beyond Tabula Rasa: Reincarnating Reinforcement Learning - Mila

#artificialintelligence

Reinforcement learning (RL) is an area of machine learning that focuses on training intelligent agents using related experiences so they can learn to solve decision making tasks, such as playing video games, flying stratospheric balloons, and designing hardware chips. Due to the generality of RL, the prevalent trend in RL research is to develop agents that can efficiently learn tabula rasa, that is, from scratch without using previously learned knowledge about the problem. However, in practice, tabula rasa RL systems are typically the exception rather than the norm for solving large-scale RL problems. Large-scale RL systems, such as OpenAI Five, which achieves human-level performance on Dota 2, undergo multiple design changes (e.g., algorithmic or architectural changes) during their developmental cycle. This modification process can last months and necessitates incorporating such changes without re-training from scratch, which would be prohibitively expensive.


Training AI: Reward is not enough

#artificialintelligence

This post was written for TechTalks by Herbert Roitblat, the author of Algorithms Are Not Enough: How to Create Artificial General Intelligence. In a recent paper, the DeepMind team, (Silver et al., 2021) argue that rewards are enough for all kinds of intelligence. Specifically, they argue that "maximizing reward is enough to drive behavior that exhibits most if not all attributes of intelligence." They argue that simple rewards are all that is needed for agents in rich environments to develop multi-attribute intelligence of the sort needed to achieve artificial general intelligence. This sounds like a bold claim, but, in fact, it is so vague as to be almost meaningless. They support their thesis, not by offering specific evidence, but by repeatedly asserting that reward is enough because the observed solutions to the problems are consistent with the problem having been solved.


The Art to Start: Tabula Rasa

#artificialintelligence

As we have seen, GPT-3 can write from scratch -- and in our series "The Art To Start", you will learn how to "scratch". Yet, it also works without any prompt. You can click "submit" and be surprised about the results. Without any prompt, GPT-3 chooses entirely random contents. Back in the 1920ies, Dadaists and Surrealists (most prominently: André Breton) examined their creativity using the method of Écriture Automatique: "automatic writing", without thinking about their results (censoring).


Noam Chomsky on the Future of Deep Learning

#artificialintelligence

For the past few weeks, I've been engaged in an email exchange with my favourite anarcho-syndicalist Noam Chomsky. I reached out to him initially to ask whether recent developments in ANNs (artificial neural networks) had caused him to reconsider his famous linguistic theory Universal Grammar. Our conversation touched on the possible limitations of Deep Learning, how well ANNs really model biological brains and also meandered into more philosophical territory. I'm not going to quote Professor Chomsky directly in this article as our discussion was informal but I will attempt to summarise the key take-aways. Noam Chomsky is first and foremost a professor of linguistics (considered by many to be "the father of modern linguistics") but he is probably better known outside of academic circles as an activist, philosopher and historian.


Domain Knowledge Integration By Gradient Matching For Sample-Efficient Reinforcement Learning

Chadha, Parth

arXiv.org Artificial Intelligence

Model-free deep reinforcement learning (RL) agents can learn an effective policy directly from repeated interactions with a black-box environment. However in practice, the algorithms often require large amounts of training experience to learn and generalize well. In addition, classic model-free learning ignores the domain information contained in the state transition tuples. Model-based RL, on the other hand, attempts to learn a model of the environment from experience and is substantially more sample efficient, but suffers from significantly large asymptotic bias owing to the imperfect dynamics model. In this paper, we propose a gradient matching algorithm to improve sample efficiency by utilizing target slope information from the dynamics predictor to aid the model-free learner. We demonstrate this by presenting a technique for matching the gradient information from the model-based learner with the model-free component in an abstract low-dimensional space and validate the proposed technique through experimental results that demonstrate the efficacy of this approach.


Privileged Information Dropout in Reinforcement Learning

Kamienny, Pierre-Alexandre, Arulkumaran, Kai, Behbahani, Feryal, Boehmer, Wendelin, Whiteson, Shimon

arXiv.org Artificial Intelligence

Using privileged information during training can improve the sample efficiency and performance of machine learning systems. This paradigm has been applied to reinforcement learning (RL), primarily in the form of distillation or auxiliary tasks, and less commonly in the form of augmenting the inputs of agents. In this work, we investigate Privileged Information Dropout (PI-Dropout) for achieving the latter which can be applied equally to value-based and policy-based RL algorithms. Within a simple partially-observed environment, we demonstrate that PI-Dropout outperforms alternatives for leveraging privileged information, including distillation and auxiliary tasks, and can successfully utilise different types of privileged information. Finally, we analyse its effect on the learned representations.